SAM Doc : Installing SAM Update-12
This page last changed on Sep 07, 2011 by wlapka.
This page describes the process to install and configure SAM/Nagios node type from scratch.
1. EnvironmentDisabled selinux in /etc/selinux/config SELINUX=disabled
2. RequirementsYou need to install host certificate in order to secure the Nagios web portal. Certificate should be placed on the standard location: ls -l /etc/grid-security/host* -rw-r--r-- 1 root root 2286 Oct 28 19:26 /etc/grid-security/hostcert.pem -r-------- 1 root root 887 Oct 28 19:25 /etc/grid-security/hostkey.pem
openssl x509 -in /etc/grid-security/hostcert.pem -noout -purpose | grep "SSL client" SSL client : Yes If you plan to use the SAM DB (i.e. NCG_TOPOLOGY_USE_SAM or NCG_REMOTE_USE_SAM set to true) you need to request access to SAM PI from your Nagios host. Details on enabling access are maintained by the SAM team here. In the request you should provide the machine address(es) and simply specify that you require access under the "EGEE-SA1 Monitoring Profile". 3. RepositoriesPackages:
Manually installed: Modifications to original repo files:
SAM repository:
[egi-sam] name=EGI SAM repo baseurl=http://repository.egi.eu/sw/production/sam/1/$basearch enabled=1 gpgcheck=0 protect=1 priority=10 Remove the old lcg-CA repository, if installed:
4. Repository prioritiesInstall yum-priorities: yum install yum-priorities Modify repository files:
5. Package installationyum install lcg-CA yum install httpd yum install nagios # make sure that nagios from EGI SAM repository is installed yum --exclude=\*saga\* --exclude=\*SAGA\* groupinstall 'glite-UI (production - x86_64)' yum install egee-NAGIOS
6. Yaim configuration
6.1. NGI instance
Yaim configuration, for ops VO this time: # Generic SITE_NAME=egee.srce.hr SITE_BDII_HOST=ce1-egee.srce.hr PX_HOST=se1-egee.srce.hr BDII_HOST=bdii-egee.srce.hr VOS="dteam ops" VO_OPS_VOMS_SERVERS="vomss://voms.cern.ch:8443/voms/ops?/ops/" VO_OPS_VOMSES="'ops lcg-voms.cern.ch 15009 /DC=ch/DC=cern/OU=computers/CN=lcg-voms.cern.ch ops 24' 'ops voms.cern.ch 15004 /DC=ch/DC=cern/OU=computers/CN=voms.cern.ch ops 24'" VO_OPS_VOMS_CA_DN="'/DC=ch/DC=cern/CN=CERN Trusted Certification Authority' '/DC=ch/DC=cern/CN=CERN Trusted Certification Authority'" VO_DTEAM_VOMS_SERVERS='vomss://voms.hellasgrid.gr:8443/voms/dteam?/dteam/' VO_DTEAM_VOMSES="'dteam voms.hellasgrid.gr 15004 /C=GR/O=HellasGrid/OU=hellasgrid.gr/CN=voms.hellasgrid.gr dteam 24' 'dteam voms2.hellasgrid.gr 15004 /C=GR/O=HellasGrid/OU=hellasgrid.gr/CN=voms2.hellasgrid.gr dteam 24'" VO_DTEAM_VOMS_CA_DN="'/C=GR/O=HellasGrid/OU=Certification Authorities/CN=HellasGrid CA 2006' '/C=GR/O=HellasGrid/OU=Certification Authorities/CN=HellasGrid CA 2006'" RB_HOST=skurut2.cesnet.cz # irelevant, RB is unsupported VO_DTEAM_WMS_HOSTS="wms204.cern.ch wms205.cern.ch" # put to your NGI WMSes VO_OPS_WMS_HOSTS="wms204.cern.ch wms205.cern.ch" # put to your NGI WMSes # Nagios NAGIOS_HOST=nagiosdev001.cern.ch NAGIOS_ADMIN_DNS="/C=HR/O=edu/OU=srce/CN=Emir Imamagic" NCG_NAGIOS_ADMIN=eimamagi@srce.hr NAGIOS_ROLE=ngi NCG_PROBES_TYPE=local NCG_VO=ops NAGIOS_HTTPD_ENABLE_CONFIG=true NAGIOS_NCG_ENABLE_CONFIG=true NAGIOS_SUDO_ENABLE_CONFIG=true NAGIOS_NAGIOS_ENABLE_CONFIG=true NAGIOS_CGI_ENABLE_CONFIG=true NAGIOS_NSCA_PASS=MY_PASS # NGI/ROC Nagios COUNTRY_NAME=Croatia NAGIOS_NCG_ENABLE_CRON=true NCG_GOCDB_ROC_NAME=NGI_HR NCG_TOPOLOGY_USE_GOCDB=false NCG_TOPOLOGY_USE_ENOC=false NCG_TOPOLOGY_USE_LDAP=false NCG_REMOTE_USE_SAM=false NCG_REMOTE_USE_NAGIOS=false NCG_REMOTE_USE_ENOC=false NCG_TOPOLOGY_USE_SAM=false NCG_TOPOLOGY_USE_ATP=true NCG_TOPOLOGY_ATP_ROOT_URL="http://grid-monitoring.cern.ch/atp" NAGIOS_SUDO_ENABLE_CONFIG=true # DB data MYSQL_ADMIN="MY_MYSQL_PASS" DB_PASS="MY_MRS_PASS" MYEGI_ADMIN_NAME="Admin Name" MYEGI_ADMIN_EMAIL="admin@address.hr" MYEGI_DEFAULT_PROFILE="ROC" MYEGI_REGION="NGI_HR" NCG_MDDB_SUPPORTED_PROFILES="ROC,ROC_CRITICAL,ROC_OPERATORS,GLEXEC"
Run Yaim: /opt/glite/yaim/bin/yaim -s site-info.def -c -n glite-UI -n glite-NAGIOS On UI box with your dteam credential run: myproxy-init -l nagios -s se1-egee.srce.hr -k NagiosRetrieve-nagiosdev001.cern.ch-dteam -c 336 -x -Z "/DC=ch/DC=cern/OU=computers/CN=nagiosdev001.cern.ch" 6.2. Site instance, remote-only probes
Yaim configuration: # Generic SITE_NAME=egee.srce.hr SITE_BDII_HOST=ce1-egee.srce.hr PX_HOST=se1-egee.srce.hr BDII_HOST=bdii-egee.srce.hr VOS="dteam ops" VO_DTEAM_VOMS_SERVERS="'vomss://voms.hellasgrid.gr:8443/voms/dteam?/dteam/'" VO_OPS_VOMS_SERVERS="vomss://voms.cern.ch:8443/voms/ops?/ops/" # Nagios NAGIOS_HOST=nagiosdev001.cern.ch NAGIOS_ADMIN_DNS="/C=HR/O=edu/OU=srce/CN=Emir Imamagic" NCG_NAGIOS_ADMIN=eimamagi@srce.hr NAGIOS_ROLE=site NCG_PROBES_TYPE=remote NCG_VO=dteam NAGIOS_HTTPD_ENABLE_CONFIG=true NAGIOS_NCG_ENABLE_CONFIG=true NAGIOS_SUDO_ENABLE_CONFIG=true NAGIOS_NAGIOS_ENABLE_CONFIG=true NAGIOS_CGI_ENABLE_CONFIG=true NCG_REMOTE_USE_NAGIOS=true NAGIOS_NSCA_PASS=MY_PASS Run Yaim: /opt/glite/yaim/bin/yaim -s site-info.def -c -n glite-NAGIOS 6.3. Site instance, all probes
Yaim configuration: # Generic SITE_NAME=egee.srce.hr SITE_BDII_HOST=ce1-egee.srce.hr PX_HOST=se1-egee.srce.hr BDII_HOST=bdii-egee.srce.hr VOS="dteam ops" VO_OPS_VOMS_SERVERS="vomss://voms.cern.ch:8443/voms/ops?/ops/" VO_OPS_VOMSES="'ops lcg-voms.cern.ch 15009 /DC=ch/DC=cern/OU=computers/CN=lcg-voms.cern.ch ops 24' 'ops voms.cern.ch 15004 /DC=ch/DC=cern/OU=computers/CN=voms.cern.ch ops 24'" VO_OPS_VOMS_CA_DN="'/DC=ch/DC=cern/CN=CERN Trusted Certification Authority' '/DC=ch/DC=cern/CN=CERN Trusted Certification Authority'"VO_OPS_VOMS_CA_DN="'/DC=ch/DC=cern/CN=CERN Trusted Certification Authority' '/DC=ch/DC=cern/CN=CERN Trusted Certification Authority'" VO_DTEAM_VOMS_SERVERS='vomss://voms.hellasgrid.gr:8443/voms/dteam?/dteam/' VO_DTEAM_VOMSES="'dteam voms.hellasgrid.gr 15004 /C=GR/O=HellasGrid/OU=hellasgrid.gr/CN=voms.hellasgrid.gr dteam 24' 'dteam voms2.hellasgrid.gr 15004 /C=GR/O=HellasGrid/OU=hellasgrid.gr/CN=voms2.hellasgrid.gr dteam 24'" VO_DTEAM_VOMS_CA_DN="'/C=GR/O=HellasGrid/OU=Certification Authorities/CN=HellasGrid CA 2006' '/C=GR/O=HellasGrid/OU=Certification Authorities/CN=HellasGrid CA 2006'" RB_HOST=skurut2.cesnet.cz VO_DTEAM_WMS_HOSTS="wms204.cern.ch wms205.cern.ch" # Nagios NAGIOS_HOST=nagiosdev001.cern.ch NAGIOS_ADMIN_DNS="/C=HR/O=edu/OU=srce/CN=Emir Imamagic" NCG_NAGIOS_ADMIN=eimamagi@srce.hr NAGIOS_ROLE=site NCG_PROBES_TYPE=remote,local NCG_VO=dteam NAGIOS_HTTPD_ENABLE_CONFIG=true NAGIOS_NCG_ENABLE_CONFIG=true NAGIOS_SUDO_ENABLE_CONFIG=true NAGIOS_NAGIOS_ENABLE_CONFIG=true NAGIOS_CGI_ENABLE_CONFIG=true NCG_REMOTE_USE_NAGIOS=true NAGIOS_NSCA_PASS=MY_PASS
Run Yaim: /opt/glite/yaim/bin/yaim -s site-info.def -c -n glite-UI -n glite-NAGIOS On UI box with your dteam credential run: myproxy-init -l nagios -s se1-egee.srce.hr -k NagiosRetrieve-nagiosdev001.cern.ch-dteam -c 336 -x -Z "/DC=ch/DC=cern/OU=computers/CN=nagiosdev001.cern.ch" 6.4. VO feed instance
Yaim configuration, for ops VO this time: # Generic SITE_NAME=egee.srce.hr SITE_BDII_HOST=ce1-egee.srce.hr PX_HOST=se1-egee.srce.hr BDII_HOST=bdii-egee.srce.hr RB_HOST=skurut2.cesnet.cz # irelevant, RB is unsupported VOS="dteam ops cms" VO_OPS_VOMS_SERVERS="vomss://voms.cern.ch:8443/voms/ops?/ops/" VO_CMS_VOMS_SERVERS="'vomss://voms.cern.ch:8443/voms/cms?/cms/'" VO_OPS_VOMSES="'ops lcg-voms.cern.ch 15009 /DC=ch/DC=cern/OU=computers/CN=lcg-voms.cern.ch ops 24' 'ops voms.cern.ch 15004 /DC=ch/DC=cern/OU=computers/CN=voms.cern.ch ops 24'" VO_CMS_VOMSES="'cms lcg-voms.cern.ch 15002 /DC=ch/DC=cern/OU=computers/CN=lcg-voms.cern.ch cms 24' 'cms voms.cern.ch 15002 /DC=ch/DC=cern/OU=computers/CN=voms.cern.ch cms 24'" VO_OPS_VOMS_CA_DN="'/DC=ch/DC=cern/CN=CERN Trusted Certification Authority' '/DC=ch/DC=cern/CN=CERN Trusted Certification Authority'" VO_CMS_VOMS_CA_DN="'/DC=ch/DC=cern/CN=CERN Trusted Certification Authority' '/DC=ch/DC=cern/CN=CERN Trusted Certification Authority'" VO_DTEAM_VOMS_SERVERS='vomss://voms.hellasgrid.gr:8443/voms/dteam?/dteam/' VO_DTEAM_VOMSES="'dteam voms.hellasgrid.gr 15004 /C=GR/O=HellasGrid/OU=hellasgrid.gr/CN=voms.hellasgrid.gr dteam 24' 'dteam voms2.hellasgrid.gr 15004 /C=GR/O=HellasGrid/OU=hellasgrid.gr/CN=voms2.hellasgrid.gr dteam 24'" VO_DTEAM_VOMS_CA_DN="'/C=GR/O=HellasGrid/OU=Certification Authorities/CN=HellasGrid CA 2006' '/C=GR/O=HellasGrid/OU=Certification Authorities/CN=HellasGrid CA 2006'" VO_DTEAM_WMS_HOSTS="wms204.cern.ch wms205.cern.ch" # put to your NGI WMSes VO_OPS_WMS_HOSTS="wms204.cern.ch wms205.cern.ch" # put to your NGI WMSes VO_CMS_WMS_HOSTS="wms204.cern.ch wms205.cern.ch" # put to your NGI WMSes # Nagios NAGIOS_HOST=nagiosdev001.cern.ch NAGIOS_ADMIN_DNS="/C=HR/O=edu/OU=srce/CN=Emir Imamagic" NCG_NAGIOS_ADMIN=eimamagi@srce.hr NAGIOS_ROLE=vo NCG_PROBES_TYPE=local NCG_VO=cms NAGIOS_HTTPD_ENABLE_CONFIG=true NAGIOS_NCG_ENABLE_CONFIG=true NAGIOS_SUDO_ENABLE_CONFIG=true NAGIOS_NAGIOS_ENABLE_CONFIG=true NAGIOS_CGI_ENABLE_CONFIG=true NAGIOS_NSCA_PASS=MY_PASS # VO Nagios NAGIOS_NCG_ENABLE_CRON=true NCG_TOPOLOGY_USE_SAM=false NCG_TOPOLOGY_USE_GOCDB=false NCG_TOPOLOGY_USE_ENOC=false NCG_TOPOLOGY_USE_LDAP=false NCG_REMOTE_USE_SAM=false NCG_REMOTE_USE_NAGIOS=false NCG_REMOTE_USE_ENOC=false NCG_USE_ATP_VO_FEED=true NCG_TOPOLOGY_ATP_ROOT_URL="http://grid-monitoring.cern.ch/atp" # DB data MYSQL_ADMIN="MY_MYSQL_PASS" DB_PASS="MY_MRS_PASS" MYEGI_ADMIN_NAME="Admin Name" MYEGI_ADMIN_EMAIL="admin@address.hr" MYEGI_DEFAULT_PROFILE="ROC" MYEGI_REGION="NGI_HR" NCG_MDDB_SUPPORTED_PROFILES="ROC,ROC_CRITICAL,ROC_OPERATORS"
Run Yaim: /opt/glite/yaim/bin/yaim -s site-info.def -c -n glite-UI -n glite-NAGIOS On UI box with your dteam credential run: myproxy-init -l nagios -s se1-egee.srce.hr -k NagiosRetrieve-nagiosdev001.cern.ch-dteam -c 336 -x -Z "/DC=ch/DC=cern/OU=computers/CN=nagiosdev001.cern.ch" 6.5. VO instanceInstallation notes of VO instance were kindly provided by Gonçalo Borges (NGI_IBERGRID): https://wiki.egi.eu/wiki/VO_Services/VO_Service_Availability_Monitoring. 7. Additional configuration7.1. Robot certificatesStarting from Update-09 SAM supports usage of robot certificates, instead of MyProxy credentials. If your CA supports robot certificates, we suggest switching to robot certificates, as they are easier to maintain. Also robots provide better availability as SAM doesn't depend on availability of MyProxy servier. In order to use robot certificates set the following YAIM variables: NCG_USE_ROBOT_CERT=true # Robot cert and key can be different for each VO # and standard Yaim VO notation is used VO_OPS_ROBOT_CERT=/etc/nagios/globus/robot-cert.pem VO_OPS_ROBOT_KEY=/etc/nagios/globus/robot-key.pem VO_DTEAM_ROBOT_CERT=/etc/nagios/globus/robot-cert.pem-dteam VO_DTEAM_ROBOT_KEY=/etc/nagios/globus/robot-key.pem-dteam
7.2. ACE support in MyEGICurrently it's only for the central MyEGI instance. YAIM configuration: MYEGI_ACE=true 7.3 Monitoring gLExec servicesService gLExec requires pilot role in the VOMS proxy certificate. In order to monitor gLExec services make sure that you have permission for the pilot role in your VO. In the Yaim configuration set the following variables: NCG_HASH_CONFIG_PROFILES=<role_name>,GLEXEC NCG_PROFILE_FQAN_GLEXEC=/<vo_name>/Role=pilot where <role_name> is name of your role in capital letters. Correct setting for NGI instances: NCG_HASH_CONFIG_PROFILES=NGI,GLEXEC NCG_PROFILE_FQAN_GLEXEC=/ops/Role=pilot 7.4 Setting alternative SE for metric org.sam.WN-RepRepStarting from the release Update-07, it is possible to specify more than one replication SE for WN replica test org.sam.WN-RepRep. Static and/or dynamic mechanisms are possible. In order to define static list of comma-separated hostnames set the following Yaim variable: JOBSUBMIT_WN_SE_REP=se1[,se2,se3...] Dynamic list is filled with a list of SEs defined on the Nagios instance that recently successfully passed org.sam.SRM-All set of tests. In order to use dynamic list set the following Yaim variable: JOBSUBMIT_WN_SE_REP_FILE=filename Filename must be defined without path. The org.sam.(CREAM)CE-JobState metric(s) takes up to max 3 hosts from the file and, if JOBSUBMIT_WN_SE_REP was defined, appends them to the static list. On WN, org.sam.WN-RepRep tries to replicate to all the SEs in the provided order until the replication succeeds. The metric returns CRITICAL, if file couldn't be replicated to any for the SEs. This fixes https://tomtools.cern.ch/jira/browse/SAM-442. 7.5 Setting alternative BDII for metric org.sam.SRM-AllMetric org.sam.SRM-All uses sam-bdii.cern.ch top BDII by default. In order to make tests less dependent on CERN top BDII it is suggested to set alternative BDII. In order to set alternative BDII create localdb file (e.g. /etc/ncg/ncg-localdb.d/srm.conf). There are two options: MODIFY_METRIC_PARAMETER!org.sam.SRM-All!--ldap-uri!your.top.bdii 2. start using site BDIIs: MODIFY_METRIC_ATTRIBUTE!org.sam.SRM-All!SITE_BDII!--ldap-uri 7.6 Setting alternative LFC for metrics org.sam.WN-Rep*Metrics org.sam.WN-Rep* use prod-lfc-shared-central.cern.ch LFC by default. In order to set alternative lfc create localdb file (e.g. /etc/ncg/ncg-localdb.d/LFC.conf): MODIFY_METRIC_PARAMETER!org.sam.CREAMCE-JobState!--wn-lfc!lfc.my.domain MODIFY_METRIC_PARAMETER!org.sam.CE-JobState!--wn-lfc!lfc.my.domain 7.7 Monitoring Globus servicesGlobus services currently do not support VOs. In order to monitor Globus services SAM administrator has to contact all sites and request to add the certificate DN to the grid-mapfile. 8. Installation validationAfter successful running of Yaim you should be able to access Nagios web interface at the address https://NAGIOS_SERVER/nagios. If you enabled local probes make sure that you first check if MyProxy credential works by running hr.srce.GridProxy-Get-VO metric on NAGIOS_SERVER. You can do this by force scheduling check via web interface or via command line: nagios-run-check NAGIOS_SERVER hr.srce.GridProxy-Get-VO MyEGI interface is at the address: https://NAGIOS_SERVER/myegi. Check resource BDII: ldapsearch -x -LLL -h NAGIOS_SERVER -p 2170 -b Mds-Vo-Name=resource,O=grid "(GlueServiceType=*-NAGIOS)" GlueServiceEndpoint dn: GlueServiceUniqueID=NAGIOS_SERVER_XXXXXX-NAGIOS_2937827985,Mds-Vo-name= resource,o=grid GlueServiceEndpoint: https://NAGIOS_SERVER:443/nagios 9. Known Issues
10. ProblemsA description of common problems when installing SAM can be found at the Troubleshooting section. |
Document generated by Confluence on Feb 27, 2014 10:19 |